Improved Densification of One Permutation Hashing

نویسندگان

Anshumali Shrivastava

Ping Li

چکیده

The existing work on densification of one permutation hashing [24] reduces the query processing cost of the (K,L)-parameterized Locality Sensitive Hashing (LSH) algorithm with minwise hashing, from O(dKL) to merely O(d + KL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. While that is a substantial improvement, our analysis reveals that the existing densification scheme in [24] is sub-optimal. In particular, there is no enough randomness in that procedure, which affects its accuracy on very sparse datasets. In this paper, we provide a new densification procedure which is provably better than the existing scheme [24]. This improvement is more significant for very sparse datasets which are common over the web. The improved technique has the same cost of O(d + KL) for query processing, thereby making it strictly preferable over the existing procedure. Experimental evaluations on public datasets, in the task of hashing based near neighbor search, support our theoretical findings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Densification for Fast and Accurate Minwise Hashing

Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification (Shrivastava & Li, 2014a;c) have shown that it is possible to compute k minwise hashes, of a vector with d nonzeros, in mere (d + k) computations, a significant improvement over the classical O(dk). These advances have led to an algorithmic impr...

متن کامل

One Permutation Hashing

Abstract Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 per...

متن کامل

Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

The query complexity of locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size (Indyk & Motwani, 1998). In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), minwise hashing is widely adopted, which requires applying a large number of permutat...

متن کامل

Sufficient conditions for sound hashing using a truncated permutation

In this paper we give a generic security proof for hashing modes that make use of an underlying fixed-length permutation. We formulate a set of five simple conditions, which are easy to implement and to verify, for such a hashing mode to be sound. These hashing modes include tree hashing modes and sequential hashing modes. We provide a proof that for any hashing mode satisfying the five conditi...

متن کامل

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets

WTA (Winner Take All) hashing has been successfully applied in many large scale vision applications. This hashing scheme was tailored to take advantage of the comparative reasoning (or order based information), which showed significant accuracy improvements. In this paper, we identify a subtle issue with WTA, which grows with the sparsity of the datasets. This issue limits the discriminative po...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Improved Densification of One Permutation Hashing

نویسندگان

چکیده

منابع مشابه

Optimal Densification for Fast and Accurate Minwise Hashing

One Permutation Hashing

Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

Sufficient conditions for sound hashing using a truncated permutation

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets

عنوان ژورنال:

اشتراک گذاری